109 research outputs found
Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization
This paper presents the first review of noise models in classification covering both label and
attribute noise. Their study reveals the lack of a unified nomenclature in this field. In order to address
this problem, a tripartite nomenclature based on the structural analysis of existing noise models is
proposed. Additionally, a revision of their current taxonomies is carried out, which are combined
and updated to better reflect the nature of any model. Finally, a categorization of noise models is
proposed from a practical point of view depending on the characteristics of noise and the study
purpose. These contributions provide a variety of models to introduce noise, their characteristics
according to the proposed taxonomy and a unified way of naming them, which will facilitate their
identification and study, as well as the reproducibility of future research
Noise simulation in classification with the noisemodel R package: Applications analyzing the impact of errors with chemical data
Classification datasets created from chemical processes can be affected by
errors, which impair the accuracy of the models built. This fact highlights the
importance of analyzing the robustness of classifiers against different types
and levels of noise to know their behavior against potential errors. In this con-
text, noise models have been proposed to study noise-related phenomenology
in a controlled environment, allowing errors to be introduced into the data in
a supervised manner. This paper introduces the noisemodel R package, which
contains the first extensive implementation of noise models for classification
datasets, proposing it as support tool to analyze the impact of errors related to
chemical data. It provides 72 noise models found in the specialized literature
that allow errors to be introduced in different ways in classes and attributes.
Each of them is properly documented and referenced, unifying their results
through a specific S3 class, which benefits from customized print, summary
and plot methods. The usage of the package is illustrated through four applica-
tion examples considering real-world chemical datasets, where errors are
prone to occur. The software presented will help to deepen the understanding
of the problem of noisy chemical data, as well as to develop new robust algo-
rithms and noise preprocessing methods properly adapted to different types of
errors in this scenario.University of Granada/CBU
Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation
Data that have not been modeled cannot be correctly predicted. Under this assumption, this
research studies how k-fold cross-validation can introduce dataset shift in regression problems. This
fact implies data distributions in the training and test sets to be different and, therefore, a deterioration
of the model performance estimation. Even though the stratification of the output variable is widely
used in the field of classification to reduce the impacts of dataset shift induced by cross-validation, its
use in regression is not widespread in the literature. This paper analyzes the consequences for dataset
shift of including different regressand stratification schemes in cross-validation with regression data.
The results obtained show that these allow for creating more similar training and test sets, reducing
the presence of dataset shift related to cross-validation. The bias and deviation of the performance
estimation results obtained by regression algorithms are improved using the highest amounts of
strata, as are the number of cross-validation repetitions necessary to obtain these better results.MCIU/AEI/ERDF, UE PGC2018098860-B-I00ERDF Operational Programme 2014-2020Economy and Knowledge Council of the Regional Government of Andalusia, Spain
MCIN/AEI CEX2020-001105-M
A-FQM-345-UGR1
Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering
Imbalance data constitutes a great difficulty for most algorithms
learning classifiers. However, as recent works claim, class imbalance
is not a problem in itself and performance degradation is also associated
with other factors related to the distribution of the data as the presence of
noisy and borderline examples in the areas surrounding class boundaries.
This contribution proposes to extend SMOTE with a noise filter called
Iterative-Partitioning Filter (IPF), which can overcome these problems.
The properties of this proposal are discussed in a controlled experimental
study against SMOTE and its most well-known generalizations. The
results show that the new proposal performs better than exiting SMOTE
generalizations for all these different scenarios.Regional Projects P1O-TIC-06858
P11-TIC-9704
P12-TIC-2958 NCN-2013/11/B/5T6/00963National Project TIN2011-28488Spanish Governmen
Modulation of coaxial modal interferometers based on long period gratings in double cladding fibers
This paper reports on the dynamic modulation of coaxial interferometers based on two cascaded long period gratings written in double cladding fibers. The interferometer is modulated by a piezoelectric ceramic which stretches one the gratings at tens of kHz, the output light is intensity modulated with an efficiency of 97 %. The device operates at 1530nm, has more than 50nm bandwidth, insertion loss of 0.4 dB and a temperature drift of 0.11 nm/ºC
Decision-Tree-Based Approach for Pressure Ulcer Risk Assessment in Immobilized Patients
Applications where data mining tools are used in the fields of medicine and nursing are
becoming more and more frequent. Among them, decision trees have been applied to different health
data, such as those associated with pressure ulcers. Pressure ulcers represent a health problem with a
significant impact on the morbidity and mortality of immobilized patients and on the quality of life
of affected people and their families. Nurses provide comprehensive care to immobilized patients.
This fact results in an increased workload that can be a risk factor for the development of serious
health problems. Healthcare work with evidence-based practice with an objective criterion for a
nursing professional is an essential addition for the application of preventive measures. In this work,
two ways for conducting a pressure ulcer risk assessment based on a decision tree approach are
provided. The first way is based on the activity and mobility characteristics of the Braden scale,
whilst the second way is based on the activity, mobility and skin moisture characteristics. The results
provided in this study endow nursing professionals with a foundation in relation to the use of their
experience and objective criteria for quick decision making regarding the risk of a patient to develop
a pressure ulcer.Consejeria de Salud, Junta de Andalucia (Fundacion Publica Andaluza Progreso y Salud) AP-0086-201
DispoCen. Much more than a program about lexical availability
DispoCen es un sistema para el análisis de la disponibilidad y la centralidad léxica. Aunque existen programas específicos para el cálculo de los citados índices, estos suelen restringir en exceso las posibilidades de análisis y explotación de los datos, bien porque se trata de herramientas obsoletas, bien porque sus códigos son excesivamente cerrados e inaccesibles. DispoCen está basado en una librería de herramientas en R que pone al alcance de quienes estudian el léxico el desarrollo de múltiples aplicaciones y modelos originales. En este trabajo hemos incluido los códigos necesarios para ejecutar los análisis, con lo que potenciamos la necesaria replicabilidad que favorece el trabajo autónomo de la comunidad investigadora. Para facilitar el acceso al sistema, también presentamos una sencilla utilidad gráfica que permite el acceso a los análisis más usuales. Como muestra de las posibilidades de DispoCen, incluimos un apartado específico con propuestas de análisis realizadas con filtros sociológicos.DispoCen is a system for the analysis of availability and lexical centrality. Although there are specific programs for calculating the mentioned index, these tend to excessively restrict the possibilities of data analysis, either because they are obsolete tools, or because their codes are excessively closed and inaccessible. DispoCen is based on a library of tools in R that makes the development of multiple applications and original models to those who study the lexicon. In this paper we have included the necessary codes to run the analysis, thereby enhancing the necessary replicability that allows the autonomous work of the research community. To facilitate access to the system, we also present a simple graphical tool that facilitates access to the most common analyzes. As a sample of the possibilities of DispoCen, we include a specific section with proposals for analysis made with sociological ítems.Este trabajo ha sido posible gracias a la financiación y patrocinio del Ministerio de Ciencia, Innovación y Universidades al Proyecto de Investigación Agenda 2050. El español de Málaga: procesos de variación y cambio espaciales y sociales (PID2019-104982GB-C5-2)
Un modelo de formación económico-social periférico en la banda atlántica de Cádiz
Exponemos un balance sobre la Edad del Bronce en San Fernando. Analizamos las bases arqueológicas para el estudio de las sociedades de mediados del II° milenio a.C. Se trata de comunidades periféricas, que habitan un medio insular, donde aprovechan recursos naturales,
fundamentalmente malacológicos, pero que generan una importante agricultura, existiendo una clara relación de dependencia respecto a un centro nuclear ubicado en las campiñas interiores.We show a summary about Bronze Age in the island of San Fernando. Archaeological basis are analyzed to study the societies of the midle of the second millennium B.C. They are peripheric communities who live in the island, where they use natural resources, fundamentally molluscs,
which generate an important agriculture, so there is a clear relation of dependence with regard to a nuclear center in the interior contryside
Fiber-Optic Aqueous Dipping Sensor Based on Coaxial-Michelson Modal Interferometers
Fiber-optic modal interferometers with a coaxial-Michelson configuration can be used to monitor aqueous solutions by simple dipping of few centimeters of a fiber tip. The fabrication of these sensors to work around 850 nm enables the use of compact, robust, and low-cost optical spectrum analyzers. The use of this type of portable sensor system to monitor sewage treatment plants is shown
Explanatory Models of Burnout Diagnosis Based on Personality Factors in Primary Care Nurses
Burnout in the primary care service takes place when there is a high level of interaction
between nurses and patients. Explanatory models based on psychological and personality related
variables provide an approximation to level changes in the three dimensions of the burnout syndrome.
A categorical-response ordinal logistic regression model, based on a quantitative, crosscutting,
multicentre, descriptive study with 242 primary care nurses in the Andalusian Health Service in
Granada (Spain) is performed for each dimension. The three models included all the variables
related to personality. The risk factor friendliness was significant at population level for the three
dimensions, whilst openness was never significant. Neuroticism was significant in the models related
to emotional exhaustion and depersonalization, whilst responsibility was significant for the models
referred to depersonalization and personal accomplishment dimensions. Finally, extraversion was
also significant in the emotional exhaustion and personal accomplishment dimensions. The analysis
performed provides useful information, making more readily the diagnosis and evolution of the
burnout syndrome in this collective.Junta de Andalucia P20_0062
- …